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Forjriat for Search Results (Circle One): 
PAPER DISK EMAIL 
where have you searched so far? 
USP DWPI EPO JPO ACM IBM TDB 
IEEE INSPEC SPI Other 



Is this a "Fast & Focused" Search Request? (Circle One) YES NO 

A "Fast & Focused" Search is completed in 2-3 hours (maximum). The search must be on a very specific topic and 
meet certain criteria. The criteriavare posted in EIC2100 and on the EIC21 00 NPL Web Page at 
http://ptoweb/patents/stic/st ic-tc2-k)0.htm. 

What is the topic, novelty, motivation, utility, or other specific details defining the desired focus of this search? Please 
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the topic. Please attach a copy of the abstract, background, brief summary, pertinent claims and any citations of 
relevant art you have found. 
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Set! Items Description 

51 6132121 PARTIAL? OR FUZZY? OR PORTION? OR SIGNIFICANT? OR PORTION? 

OR FRACTION? OR FRAGMENT? 

52 8915834 MATCH? OR QUER? OR SEARCH? OR RETRIEV? OR LOCAT? OR IDENTI 

F? 

53 10406846 STRING? OR SEARCHSTRING? OR CHARACTER? OR ALPHANUMERIC? OR 

LETTER? OR WORD? OR TERM? OR PHRASE? 

54 28275 (SINGLE OR ONE OR INDIVIDUAL? OR UNIQUE?) (N) (OCCUR? OR APP 

EAR? OR MATCH?) 

55 416469 S2(N) (ENGINE? OR SOFTWARE? OR APPLICATION? OR SYSTEM? OR P 

ROGRAM? OR CRAWLER? OR IA OR BOT OR ROBOT OR AGENT? OR TOOL?) 
OR SEARCHENGINE? 
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S16 
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S15 NOT PD-20010405:20040409 



File 275:Gale Group Computer DB(TM) 1983-2004 /Apr 09 

(c) 2004 The Gale Group 
File 47:Gale Group Magazine DB(TM) 1959-2004 /Apr 09 

(c) 2004 The Gale group 
File 636:Gale Group Newsletter DB(TM) 1987-2004 /Apr 09 

(c) 2004 The Gale Group 
File 16:Gale Group PROMT (R) 1990-2004 /Apr 09 

(c) 2004 The Gale Group 
File 624 :McGraw-Hill Publications 1985-2004 /Apr 08 

(c) 2004 McGraw-Hill Co. Inc 
File 484 : Periodical Abs Plustext 1986-2004 /Apr Wl 

(c) 2004 ProQuest 
File 613: PR Newswire 1999-2004 /Apr 09 

(c) 2004 PR Newswire Association Inc 
File 813: PR Newswire 1 987-1999/Apr 30 

(c) 1999 PR Newswire Association Inc 
File 696: DIALOG Telecom. Newsletters 1995-2004 /Apr 08 

(c) 2004 The Dialog Corp. 
File 621:Gale Group New Prod.Annou. (R) 1985-2004 /Apr 09 

(c) 2004 The Gale Group 
File 674:Computer News Fulltext 198 9-2004 /Apr Wl 

(c) 2004 IDG Communications 
File 369:New Scientist 1994-2004 /Apr Wl 

(c) 2004 Reed Business Information Ltd. 
File 160:Gale Group PROMT (R) 1972-1989 

(c) 1999 The Gale Group 
File 15:ABI/Inform(R) 1971-2004/Apr 09 

(c) 2004 ProQuest Inf o&Learning 
File 13:BAMP 2004/Mar W3 

(c) 2004 The Gale Group 
File 647:CMP Computer Fulltext 1988-2004 /Mar W4 

(c) 2004 CMP Media, LLC 
File 148:Gale Group Trade & Industry DB 1 97 6-2004 /Apr 09 

(c)2004 The Gale Group 



16/3, K/l (Item 1 froW file: 47) 

DIALOG (R) File 47: Gale Group Magazine DB(TM) 
(c) 2004 The Gale group. All rts. reserv. 

05862119 SUPPLIER NUMBER: 63842650 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

Text Retrieval Products for Libraries. (Technology Information) (Statistical 
Data Included) 

Saffady, William 

Library Technology Reports, 36, 2, 5 
March, 2000 

DOCUMENT TYPE: Statistical Data Included ISSN: 0024-2586 

LANGUAGE: English RECORD TYPE: Fulltext 

WORD COUNT: 35970 LINE COUNT: 03217 




... for the ZyFIND component's excellent repertoire of information 

retrieval capabilities. Few, if any text retrieval programs , can match 

ZylMAGE's ability to import word processing files, spreadsheets, and 
databases created by popular and all-but-forgotten programs. ZyFIND 
combines familiar components, such as phrase searching and Boolean 
operators, with such unusual features as quorum searching and the 
application of relational expressions to numbers embedded in text files. 
Other notable retrieval capabilities include versatile proximity 
commands, root word searching , suffix matching , single and multiple 
wildcard characters , fuzzy searching , and a preconf igured thesaurus 
for synonym selection. 

Most program operations, including indexing and searching, are... 



16/3, K/ 4 (Item 1 from file: 15) 

DIALOG (R) File 15 : ABI /Inform ( R) 

(c) 2004 ProQuest Inf o&Learning . All rts. reserv. 



02510113 258853721 

Project delivery system selection: A case-based reasoning framework 

Ribeiro, Francisco Loforte 

Logistics Information Management vl4n5/6 PP: 367-375 2001 
ISSN: 0957-6053 JRNL CODE: LIM 
WORD COUNT: 4 951 

...TEXT: score is calculated and the highest ranking cases are then 
presented to the user. 

The system searches the project delivery system cases using the 

hierarchical search algorithm, first looking for cases exactly matching the 
specified new case problem, and then for partial matches . Exactly 
matching are those whose are the same as those specified for a new case 
problem. Partial matches , in order of preference, are project delivery 
system cases matching one or two, or the three indices fully or 

partially. Given a description of the new. . . 



* Set Items Description 

51 ' 184 5078 PARTIAL? OR FUZZY? OR PORTION? OR SIGNIFICANT? OR PORTION? 

OR FRACTION? OR FRAGMENT? 

52 1510430 MATCH? OR QUER? OR SEARCH? OR RETRIEV? OR LOCAT? OR I DENT I - 

F? 

53 24 354 68 STRING? OR SEARCHSTRING? OR CHARACTER? OR ALPHANUMERIC? OR 

LETTER? OR WORD? OR TERM? OR PHRASE? 

54 2250 (SINGLE OR ONE OR INDIVIDUAL? OR UNIQUE?) (N) (OCCUR? OR APP- 

EAR? OR MATCH?) 

55 69 SI AND S2 AND S3 AND S4 

56 385 S2 AND S4 AND S3 

57 4347 S1(2N)S2 AND S3 

58 24 S5 AND IC=G06F? 

59 19 S8 NOT AD>20010405 

510 10 S6 AND S7 

511 26 S10 OR S9 

512 22 Sll NOT AD>20010405 

513 22 IDPAT (sorted in duplicate/non-duplicate order) 

514 22 IDPAT (primary/non-duplicate records only) 
File 347:JAPIO Nov 1 97 6-2003/Dec (Updated 040402) 

(c) 2004 JPO & JAPIO 
File 350:Derwent WPIX 1963-2004 /UD, UM &UP=200419 
(c) 2004 Thomson Derwent 




14/5/1 (Item 1 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent . All rts. reserv. 

015833463 **Image available** 

WPI Acc No: 2003-895667/200382 

XRPX Acc No: N03-714602 

Key phrase producing method for multimedia applications, involves 
processing feature vectors generated for each frames, and applying 
predetermined rules to marked vectors in order to select label as key 
phrase of song 

Patent Assignee: HEWLETT-PACKARD DEV CO LP (HEWP ) 

Inventor: CHITS M; LOGAN B T 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6633845 Bl 20031014 US 2000545893 A 20000407 200382 B 

Priority Applications (No Type Date) : US 2000545893 A 20000407 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 6633845 Bl 16 G10L-015/28 

Abstract (Basic) : US 6633845 Bl 

NOVELTY - The method involves dividing a part of a song into a set 
of frames, and generating a feature vector for each frame. The feature 
vectors are processed to identify songs structure. The vectors 
related with different structural parts of the song having different 
labels are marked. Predetermined rules are applied to the marked 
vectors for selecting single occurrence of a chosen label as a key 
phrase (214) of the song. 

DETAILED DESCRIPTION - Each feature vector has parameters whose 
values are characteristics of that portion of the song contained 
within the respective frame. INDEPENDENT CLAIMS are also included for 
the following: 

(a) a system to produce a key phrase for a song 

(b) a computer readable medium to produce a key phrase for a 
song . 

USE - Used for producing key phrase in multimedia applications, 
databases and search engines. 

ADVANTAGE - The method automatically generates the key phrase or 
summary of a song. The method employs the summary as an index to the 
song so that the user can identify the song by hearing the key 
phrases . 

DESCRIPTION OF DRAWING (S) - The drawing shows a block diagram of a 
song summarization system. 
Signal processor (202) 
Vector extraction engine (204) 
Key phrase identifier logic (208) 
Audio input (210) 
Key phrase (214) 
pp; 16 DwgNo 2/7 

Title Terms: KEY; PHRASE ; PRODUCE; METHOD; APPLY; PROCESS; FEATURE; 

VECTOR; GENERATE; FRAME; APPLY; PREDETERMINED; RULE; MARK; VECTOR; ORDER; 

SELECT; LABEL; KEY; PHRASE ; SING 
Derwent Class: P75; P86; T01; W04 
International Patent Class (Main) : G10L-015/28 

International Patent Class (Additional): B41J-003/34; G06F-007/00 ; 

G10G-007/00; G10L-021/06 
File Segment: EPI; EngPI 



14/5/3 (Item 3 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 
(c) 2004 Thomson Derwent. All rts. reserv. 

013979168 **Image available** 

WPI Acc No: 2001-463382/200150 

XRPX Acc No: N01-343477 

Computer readable medium for word processing system, has condensed 
lexion database with data tree having nodes containing reading pair 
identification number and instructions for mapping reading pair ID 
number array 

Patent Assignee: MICROSOFT CORP (MICR-N) 

Inventor: CAI P P; HALSTEAD P H 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6175834 Bl 20010116 US 98104257 A 19980624 200150 B 

Priority Applications (No Type Date) : US 98104257 A 19980624 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 6175834 Bl 21 G06F-017/00 

Abstract (Basic) : US 6175834 Bl 

NOVELTY - The condensed lexion database (CLD) has data tree having 
nodes, each including reading pair ID numbers (RID) and computer 
executable instructions for mapping RID array onto CLD. Reading pair 
database (RPD) is accessed to match one reading unit in selected 
word , to either of reading units of reading pairs in RPD and matching 

RID is retrieved . Each word is reformed as RID array which is 
mapped onto the CLD. 

DETAILED DESCRIPTION - The medium has reading pair database (RPD) 
having several reading pairs and several reading pair identification 
numbers (RIDs) . Each of the reading pairs have two reading units in two 
writing system respectively. Each of the RIDs correspond to one of the 
reading pairs. The RPD is accessed to match one reading unit of the 
word with reading units in RPD. A reply message is output to indicate 
whether mapping of RID array onto CLD is successful or unsuccessful. 
INDEPENDENT CLAIMS are also included for the following: 

(a) Consistency checking method; 

(b) Common spelling variants generating method; 

(c) Reading pair database generating method 

USE - In identification of inconsistently spelled Japanese words 
in document . 

ADVANTAGE - All acceptable spelling variants of particular Japanese 
word is identified and generated substantially. Spelling variants 
that are used inconsistently with other spelling variants in the same 
document are identified . The statistics of spelling variant uses is 
maintained within particular document which enables consistency checker 
to identify lesser used variants. 

DESCRIPTION OF DRAWING (S) - The figure shows the pictorial 
representation of portions of CLD. 

pp; 21 DwgNo 6/8 

Title Terms: COMPUTER; READ; MEDIUM; WORD ; PROCESS; SYSTEM; CONDENSATION; 

DATABASE; DATA; TREE; NODE; CONTAIN; READ; PAIR; IDENTIFY ; NUMBER; 

INSTRUCTION; MAP; READ; PAIR; ID; NUMBER; ARRAY 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/00 
File Segment: EPI 



14/5/6 (Item 6 from^Ile: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent . All rts. reserv. 



011638359 **Image available** 

WPI Acc No: 1998-055267/199806 

XRPX Acc No: N98-043771 

Method of facilitating access to selectable element on graphical user 
interface - involves matching one or more characters received from 
character based input device with character portion of at least one 
selectable element within multiplicity of lexically unordered selectable 
elements 

Patent Assignee: SUN MICROSYSTEMS INC (SUNM ) 
Inventor: GENTNER D R; JOHNSON E; NIELSEN J 
Number of Countries: 025 Number of Patents: 004 
Patent Family: 



Patent No 


Kind 


Date 


Applicat No 


Kind 


Date 


Week 


EP 


816990 


A2 


19980107 


EP 97304491 


A 


19970625 


199806 


JP 


10116294 


A 


19980506 


JP 97184597 


A 


19970626 


199828 


US 


5884318 


A 


19990316 


US 96670952 


A 


19960626 


199918 


US 


5963950 


A 


19991005 


US 96670952 


A 


19960626 


199948 



Priority Applications (No Type Date) : US 96670952 A 19960626 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
EP 816990 A2 E 22 G06F-003/023 

Designated States (Regional) : AL AT BE CH DE DK ES FI FR GB GR IE IT LI 

LT LU LV MC NL PT RO SE SI 
JP 10116294 A 29 G06F-017/30 

US 5884318 A G06F-017/30 
US 5963950 A G06F-017/30 



Abstract (Basic) : EP 816990 A 

The method involves receiving one or more characters from a 
character based input device. The one or more characters received 
from the character based input device are compared with the 
character portion from one or more of the multiplicity of lexically 
unordered selectable elements. The one or more characters received 
from the character based input device are matched with the 
character portion of at least one selectable element within the 
multiplicity of lexically unordered selectable elements. 

A selectable element which matched the one or more characters 
received from the character based input device is armed. A previously 
armed selectable element is disarmed before arming the selectable 
element which matched the one or more characters received from the 
character based input device. The armed selectable element is selected 
in response to receiving an actuation input signal which indicates the 
armed selectable element should be selected. 

ADVANTAGE - Allows user to quickly search and select a selectable 
element by typing minimum number of character . 

Dwg.5/11 

Title Terms: METHOD; FACILITATE; ACCESS; SELECT; ELEMENT; GRAPHICAL; USER; 

INTERFACE; MATCH ; ONE; MORE; CHARACTER ; RECEIVE; CHARACTER ; BASED; 

INPUT; DEVICE; CHARACTER ; PORTION ; ONE; SELECT; ELEMENT; MULTIPLICITY 

; SELECT; ELEMENT 
Derwent Class: T01 

International Patent Class (Main) : G06F-003/023 ; G06F-017/30 
International Patent Class (Additional) : G06F-003/14 
File Segment: EPI 



Description 

PARTIAL? OR FUZZY? OR PORTION? OR SIGNIFICANT? OR PORTION? 
OR FRACTION? OR FRAGMENT? 

MATCH? OR QUER? OR SEARCH? OR RETRIEV? OR LOCAT? OR IDENTI- 

F? 

STRING? OR SEARCHSTRING? OR CHARACTER? OR ALPHANUMERIC? OR 
LETTER? OR WORD? OR TERM? OR PHRASE? 

(SINGLE OR ONE OR INDIVIDUAL? OR UNIQUE?) (N) (OCCUR? OR APP- 
EAR? OR MATCH?) 

51 AND S2 AND S3 AND S4 

52 AND S4 AND S3 
SI (2N) S2 AND S3 

55 AND IC=G06F? 
S8 NOT AD>20010405 

56 AND S7 

510 OR S9 

511 NOT AD>20010405 

ID PAT (sorted in duplicate/non-duplicate order) 
ID PAT (primary/non-duplicate records only) 

S2(N) (ENGINE? OR SOFTWARE? OR APPLICATION? OR SYSTEM? OR P- 
ROGRAM? OR CRAWLER? OR IA OR BOT OR ROBOT OR AGENT? OR TOOL?) 
OR SEARCHENGINE? 

516 51 S15 AND (S4 OR (MINIMUM OR MINIMAL) () SI ) 

517 27 S16 AND IC=(G06F? OR H04L?) 

518 24 S17 NOT Sll 

519 13 S18 NOT AD>20010405 

File 347:JAPIO Nov 1 97 6-2003/Dec ( Updated 040402) 

(c) 2004 JPO & JAPIO 
File 350:Derwent WPIX 1963-2004 /UD, UM &UP=200419 

(c) 2004 Thomson Derwent 
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19/5/7 (Item 7 from WLb: 350) 

DIALOG (R) File 350:Derwent WPIX 

(c) 2004 Thomson Derwent. All rts. reserv. 

013455712 **Image available** 

WPI Acc No: 2000-627655/200060 

XRPX Acc No: N00-465000 

Information retrieval system using natural language queries in 
Internet, analyzes language based database and natural language query to 
generate database keywords and query keywords, respectively 

Patent Assignee: NOVELL INC (NOVE-N) 

Inventor: AKKER D V D; DE BIE P; DE HITA C R; DEUN K V; GOVAERS E C E; 

LAVIOLETTE S; MACPHERSON M; PLATTEAU F M J 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6081774 A 20000627 US 97916628 A 19970822 200060 B 

Priority Applications (No Type Date) : US 97916628 A 19970822 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 6081774 A 41 G06F-017/27 

Abstract (Basic) : US 6081774 A 

NOVELTY - A non-real time development system (102) and a real time 
retrieval system (104) morphologically, syntactically and 
linguistically analyze a language based database and natural language 
query, respectively to generate one or more database keywords and query 
keywords, respectively. The database and query keywords represent 
content of language based database and natural language query (160), 
respectively. 

DETAILED DESCRIPTION - The non-real time development system creates 
a database index (130) having one or more content based keywords of the 
database, automatically. The real time retrieval system searches 
the index for query keywords derived from natural language query based 
on user's queries. The non-real time development system comprises a 
software developer's kit for creating database index, utilizing a 
pattern dictionary that includes synonyms and skipwords. A morphous 
syntactic dictionary in the system includes morphological and syntactic 
information for words in the natural language of language based 
database and natural language query. The real time retrieval system 
has a natural language interface (170) that creates one or more query 
keywords utilizing pattern and morphosyntactic dictionaries. A query 
index matcher matches one or more query keywords with one or more 
database keywords. 

USE - For retrieving information from language based database using 
natural language queries in Internet and intranet . 

ADVANTAGE - Enables any software developer to add information 
retrieval system to pre-existing software application to provide a 
user interface that enables user to develop a query in natural 
language. The software developer's kit enables software developers to 
add natural language interface and associated information retrieval 
capability to existing software application without any development 
work . 

DESCRIPTION OF DRAWING (S) - The figure shows functional block 
diagram of information retrieval system . 
Non-real time development system (102) 
Real time retrieval system (104) 
Database index (130) 
Natural language query (160) 
Natural language interface (170) 
pp; 41 DwgNo 1/19 

Title Terms: INFORMATION; RETRIEVAL; SYSTEM; NATURAL; LANGUAGE; QUERY; 

LANGUAGE; BASED; DATABASE; NATURAL; LANGUAGE; QUERY; GENERATE; DATABASE; 

KEYWORD; QUERY; KEYWORD; RESPECTIVE 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/27 
International Patent Class (Additional) : G06F-007/00 



File Segment: EPI 



Set Items Descriptic 

51 6282204 PARTIAL? OR FUZZY? OR PORTION? OR SIGNIFICANT? OR PORTION? 

OR FRACTION? OR FRAGMENT? 

52 5036189 MATCH? OR QUER? OR SEARCH? OR RETRIEV? OR LOCAT? OR I DENT I - 

F? 

53 968 6031 STRING? OR SEARCHSTRING? OR CHARACTER? OR ALPHANUMERIC? OR 

LETTER? OR WORD? OR TERM? OR PHRASE? 

54 12113 (SINGLE OR ONE OR INDIVIDUAL? OR UNIQUE?) (N) (OCCUR? OR APP- 

EAR? OR MATCH?) 

55 189546 S2(N) (ENGINE? OR SOFTWARE? OR APPLICATION? OR SYSTEM? OR P- 

ROGRAM? OR CRAWLER? OR IA OR BOT OR ROBOT OR AGENT? OR TOOL?)- 
OR SEARCHENGINE? 

56 562 SI AND S2 AND S3 AND S4 

57 30 SI AND S5 AND S4 

58 53 SI (5N) S2 AND S6 

59 24 S3(2N)S1 AND S6 

510 24 S1(2N)S2 AND S6 

511 72 S7 OR S9 OR S10 

512 56 RD (unique items) 

513 44 S12 NOT PY>2001 

514 43 S13 NOT PD=20010405 : 20030405 

515 43 S14 NOT PD=20030405 : 20040409 

516 43 S15 NOT CY>2001 

File 8:Ei Compendex(R) 1970-2004 /Mar W4 

(c) 2004 Elsevier Eng. Info. Inc. 
File 35: Dissertation Abs Online 1861-2004 /Mar 

(c) 2004 ProQuest Inf o&Learning 
File 202:Info. Sci . & Tech. Abs. 1966-2004 /Feb 27 

(c) 2004 EBSCO Publishing 
File 65: Inside Conferences 1993-2004 /Apr Wl 

(c) 2004 BLDSC all rts. reserv. 
File 2:INSPEC 1969-2004 /Mar W4 

(c) 2004 Institution of Electrical Engineers 
File 94: JICST-EPlus 1 985-2004 /Mar W3 

(c)2004 Japan Science and Tech Corp(JST) 
File 111:TGG Natl . Newspaper Index(SM) 1 97 9-2004 /Apr 09 

(c) 2004 The Gale Group 
File 233: Internet & Personal Comp. Abs. 198 1-2003/Sep 

(c) 2003 EBSCO Pub. 
File 6:NTIS 1964-2004 /Apr Wl 

(c) 2004 NTIS, Intl Cpyrght All Rights Res 
File 144: Pascal 1973-2004 /Mar W4 

(c) 2004 INIST/CNRS 
File 434 : SciSearch ( R) Cited Ref Sci 1974-1989/Dec 

(c) 1998 Inst for Sci Info 
File 34: SciSearch (R) Cited Ref Sci 1990-2004 /Apr Wl 

(c) 2004 Inst for Sci Info 
File 62:SPIN(R) 1975-2004 /Feb W3 

(c) 2004 American Institute of Physics 
File 99: Wilson Appl . Sci & Tech Abs 1983-2004 /Mar 

(c) 2004 The HW Wilson Co. 



16/5/6 (Item 6 from file: 8) 

DIALOG (R) File 8 : Ei Compendex (R) 

(c) 2004 Elsevier Eng. Info. Inc. All rts. reserv. 



00963649 E.I. Monthly No: EI8011083324 E.I. Yearly No: EI80044498 
Title: PARTIAL - MATCH RETRIEVAL IN AN INDEX SEQUENTIAL DIRECTORY. 

Author: Zvegintzov, N. 

Source: Computer Journal v 23 n 1 Feb 1980 p 37-40 

Publication Year: 1980 

CODEN: CMPJA6 ISSN: 0010-4 620 

Language: ENGLISH 

Journal Announcement: 8011 

Abstract: An algorithm is described which, given an index sequential 
directory of keys, and given a set of partially specified templates, 
retrieves all keys in the directory that match one or more templates. 
Algorithms are given for the common special case where the keys are fixed 
length strings in lexicographic order. The origins, applications, and 
properties of these algorithms are discussed. 7 refs. 

Descriptors: INFORMATION RETRIEVAL SYSTEMS 

Classification Codes: 

723 (Computer Software); 901 (Engineering Profession) 

72 (COMPUTERS & DATA PROCESSING); 90 (GENERAL ENGINEERING) 



16/5/7 (Item 1 from^le: 35) 

DIALOG (R) File 35 : Dissertation Abs Online 

(c) 2004 ProQuest Inf o&Learning. All rts. reserv. 



01736162 ORDER NO: AADAA- 1 996387 1 

Use of genetic algorithms in information retrieval : Adapting matching 
functions 

Author: Pathak, Praveen A. 
Degree: Ph.D. 
Year: 2000 

Corporate Source/Institution: The University of Michigan (0127) 
Chair: Michael Gordon 

Source: VOLUME 61/03-A OF DISSERTATION ABSTRACTS INTERNATIONAL. 

PAGE 804. 141 PAGES 
Descriptors: INFORMATION SCIENCE ; COMPUTER SCIENCE ; ARTIFICIAL 

INTELLIGENCE 
Descriptor Codes: 0723; 0984; 0800 

Information retrieval systems are complex in nature due to the 
interactions of document, query, and matching subsystems involved in the 
process of retrieval. Researchers have applied probabilistic, 
knowledge-based, and, more recently, artificial intelligence based 
techniques like neural networks and symbolic learning to this problem. Very 
few researchers have tried to use evolutionary algorithms like genetic 
algorithms (GA ! s). Previous attempts at using GA's have concentrated on 
modifying the document representations or modifying the query 
representations . 

In this research, we explore the possibility of applying GA's to adapt 
the matching functions used in retrieval. We have described a method where 
an overall matching function is achieved by combining the results of the 
individual matching functions. The weights associated with individual 
matching functions have been adapted using GA ! s. We tested the method on 
two document collections. Experiments on these collections suggest that a 
GA based matching function adaptation significantly improves retrieval 
performance compared to the performance obtained by the best individual 
matching function. 

We believe the promising outcomes of the GA based matching function 
adaptation merits continuing research. We briefly present possible areas of 
future research such as simultaneous adaptations of the three subsystems 
involved in retrieval, user profiling using this approach, and evolving new 
matching functions. 
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Abstract 

In this paper, we first introduced the 
concept of Cartesian product files. We 
then derived a formula for random files. 
A computer simulation experiment was 
performed to compare these two file.s. So 
far as shown by the experimental results, 
the Cartesian product file concept was 
indeed a good one. We also showed that 
the problem of designing an optimal 
Cartesian product file was partially 
related to the problem of finding a minimal 
N- tuple. A method to find minimal N-tuples 
was presented and its properties were 
discussed. 

Section 1. Introduction 

In this paper, we are concerned with 
the problem of designing optimal multi- 
attribute file systems for partial match 
queries [Rivest 1976, Rothnie and Lozano 
1974, Liou and Vao 1975, Bentley 1979, 
Lee and Tseng 1979, Lin, Lee and Du 
1979] . By a multi- attribute file system, 
we mean a file system whose records are 
characterized by more than one attribute. 
By partial match queries, we mean queries 
of the following form: Retrieve all 
records where A^-j^a^, Aj^a^r -'-r 

A ij =a ij and i 1 ^i 2 ^---^ i j- 

We shall assume that every file is 
divided into buckets. The problem of 
multi-attribute file design can be 
explained by considering the two file 
systems shown in Table 1.1 and Table 1.2. 



Table. 1,1 here 
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Table 1.2 here 



In both tables, a query (a,*) denotes a 
query retrieving records with the first 
attribute equal to a and the second 
attribute with any value. Similarly, for 
a 3-attribute file system, a query denoted 
as (*,b,c) denotes a query retrieving all 
records with A2=b and A3=c and Ax can be 
of any value. The reader can see that 
the average number of buckets to be 
examined, over all possible queries, is 
2 for the file system in Table 1.2 and 4 
for that in Table 1.1. 

Thus the problem of multi- attribute 
file system design for partial match 
queries is as follows: Given a set of 
multi-attribute records, arrange the 
records into the KB buckets in such a way 
that the average number of buckets to be 
examined, over all possible partial match 
queries, is minimized. 

The general problem stated above is 
rather hard to solve. In this paper, we 
shall limit ourselves to the case where 
all possible records are present. Note 
that every record is characterized by N 
attributes Ai, A2* A 3* A N» Let tne 

domain of attribute Ai be denoted as Dj_. 
Thus the set of all possible records is 
T>1*1>2* • * In tne rest of this paper, 

whenever we discuss the partial match 
problem, we shall assume that every 
possible record in this set DixD2><».^Dn 
is present. If some of the records in 
the set Di*D2 x • . **Dn are missing, we con- 
sider the optimization of Cartesian 
product files with respect to partial 
match patterns which were defined by Lin, 
Lee and Du [1979] . 

Section 2. Cartesian Product Files and 
Random Files 

Multi-attribute file system design 
for partial match queries has been con- . 
sidered'by many authors. Rivest (19761 
suggested the string homomorphism hashing 
CSHH for short) method. Rothnie. and 
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Lozano [1974] suggested the multi-key 
hashing (MKH for short) method. Liou and 
Yao [1975] suggested the multi- dimensional 
directory (MDD for short) method. Lee and 
Tseng [1979] suggested the multi-key 
sorting (MKS for short) method. 

In [Lin, Lee and Du 1979] , it was 
proved that all of those file designing 
methods exhibit one common property i 
Records in one bucket are similar to one 
another. In (Lin, Lee and Du 1979] , it 
was also pointed out that the file system 
designed by using the SHH, MKH and MDD 
methods are all Cartesian product files 
which are defined as follows. 

Definition : Let there be N attributes A^, 
A2r ♦ Ajj. Let the domain of A^ be 
D<. A Cartesian product file is a 
file in which the records in every 
bucket is of the form Di s xD2 S * • - **Dn s 
where D^ s is a subset of Dj.. 

Example 2,1 



Let Di={a,b,c,d) =Do. Let Dn={a,b} 
=D 2 i. Let Di 2 ={c f d} =D 2 2- Then the 
following file is a Cartesian product 



file. 



Bucket 1: Dxi*D 2 i={ Ca,a) , (a,b), (b f a), 
(b,b)> 

Bucket 2: D. , xD„°{ (a,c) , (a,d) , (b,c) , 

11 12 (b,d)> 
Bucket 3: ^ 2 2 xj> 21^ { (c ' a) ' te 'W » Cd,a) , 
(d,b)> 

Bucket 4: D 12 xd 22 ={ (c,c) , (c,d) , (d,c) , 
(d f d) } 

The reader can see that the above 
file system is exactly the same file 
system shown in Table 1.2. 

Example 2.2 

Let Di={a,b,c,d,e} and D 2 ={a,b,c f d} . 
Let Dn={a / b,c} / D!2={a,e}, D2i= 
{a,b> and D 2 2={c,d}. Then the 
following file system is a Cartesian 
product file system. 

Bucket 1: D^xD, ={ Ca,a) , Ca,b) , Cb,a) , 

21 (b,b), (c,a), (c,b)} 

Bucket 2: D . xD, «{ (a,c) , ta,d), (b,c) , 

AX * Z Cb,d>, (c,c), (c,d)} 

Bucket 3: ^> 12 Hj> 2\ ={ Cd ' a) ' <d ' b) ' Ce ' d) ' 
Ce,b) } 

Bucket 4: D- -xD ={ td,c) , (d,d) , le,c) , 
^ 12 (e,d)> 

Note that in this case, the number of 
records in Bucket 1 is not the same as 
that in Bucket 3. 

It was also pointed out in [Lin, Lee 
and Du 1979] that records in a Cartesian 
product file form a short spanning path 
[Slagle, Chang and Lee 1974], That is, 
records in a bucket of a Cartesian product 
file can be ordered into a sequence R^, 
R2# • • • t %2 and for every pair of con- 



secutive records and Ri+i (l<i<BZ) , 
these two records are different at only 
one attribute. For instance, consider 
Bucket 1 in the above example. The 
records in this bucket can be reduced into 
the following sequence: 

Ca,al 
Ca r b) 
Cb,b) 
(b,a) 
Cc,a) 
(c,b) 

Since two consecutive records are different 
at only one attribute, a Cartesian product 
file exhibits the property of clustering 
similar records together. 

It is our conjecture that the Cartesian 
product file concept is optimal in the 
sense that a Cartesian product file is 
always better than a non-Cartesian product 
file, we have not been able to prove 
this conjecture yet. However, we do have 
some results to show the superiority of 
Cartesian product files. 

Let us call a file where records are 
randomly placed in buckets a random file. 
In the following, we shall derive a 
formula giving the expected number of 
buckets accessed over all possible partial 
match queries for a random file. Again, 
let us assume that our records are char- 
acterized by N attributes Ai, A2, . , . , Ajj 
and the domain of Ai is Di. Let the 
number of elements in D± be denoted as d^ 
Then the number of records NR is equal to 
di&2***^N* Let KB denote the number of 
buckets. Then the bucket size BZ is 
equal to NF/NB. Let ANB R denote the 
expected number of buckets being accessed 
over all possible partial match queries 
in a random file. 

First let us consider a special par* 
tial match query A^-ai where aieD£. There 
are dixd2* . „ . xfli-ixdi+i* records 
satisfying the condition Ai=ai. Since 
each record is randomly assigned to a 
bucket, the probability that a bucket 
receives a record is 1/NB. The expected 
number of buckets being accessed for this 
query is equal to the number of buckets 
which are not empty when we randomly 
assign d^xd^x . . .xdi^ixd£ + ]X • , .*dj| records 
to KB buckets. 

The probability of a bucket being 
empty 

ll ~NB' 



the probability that all ^x>d 2 
dj^i.di+i.dfl records are assigned to 
other Buckets. 



The probability that a bucket is not 
empty 



, fl a. a i ,a 2 ,,,a i-a' a i+i' 
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The expected number of buckets which 
are not empty 

d. *d>k**"d> - • d . _ • • • d_ 

Kote that for all aieD^, all partial match 
queries A£=ai produce the same result. 

ANB = ( E the expected number of 
all partial match queries 
buckets being accessed for a 
partial match query) /the total 
number of different partial match 
queries 

The total number of partial match 
queries can be found as follows: 

(1) There are di+d 2 +...+dN partial 
match queries which involve exactly one 
attribute . 

(2) There are did2+did3+. . .+d N _id N 
partial match queries which involve 
exactly two attributes. 

(3) There are did2- • .du-i+- • .+d2d3 
...d N partial match queries which involve 

exactly N- 1 attribute s . 

/ 

Let (dj^djj, t&x±} be a fiUDset 

with i elements chosen from {d-i ,d2 , . . . ,d N >. 
In general, there are E dj^'dij' • -dj^ 

t d l3/ d l2* • • ' d Ii^ e ^ d if d 2' * ' ' '^N* 
partial match queries which involve exactly 
i attributes. The total number of queries 
= di+d2+. . »+djj 
+ djd2+d2d 3 + . . . +du-ld N 
+ ... 

+ d]d2- • *dN_l+. . .+d2d3- • -d N 
N-l 

= E a^-di^d^'-'d^ 

{di lf dj 2 , . . . ,di i }e{d 1# d 2 , — r^) 

Let TNBi be the total number of 
buckets being accessed over all the partial 
match queries with i attributes being 
specified. 



In general, 



TNB . 



(1) TNB! 
= E drNBd-U-^) 

V {d l' d 2'"" d N } 

(2) TNB 2 . 



1 A' a 2 M,fl i-l , *+ lM \ 



= E di-dj-NB(l-Orjg) ) 
{a i ,d^)e{d 1 ,d 2 , . . . .dj,} and i< j 

(3) TNBj,.! 

= d2-d 3 ".d ir NB(l-(l-Jg) dl ) 

+ d 1 -d 3 ...d N .llB(l-(l-i) d2 ) 



1 , (d 1 -d 2 --a M y(d I ..di,--di i ) 

<ctii'«ai 2 ' . . . f di i >e{d 1 -d2- • I^V 1 ! 

and 

Hence given &\, d 2 f •••r d N , MB and 
NR = di«d 2 •**<%, BZ = NR/NB, 



we can 



calculate ANB. 



R* 



Example 2.1 

Let &i, 6U, d3 be 3, 4 and 5 respec- 
tively* Let NB be equal to 4. In this 



case, 



TNB = 3x4x(l-U-i) 4 * 5 
+ 4x4k<1-(1-|) 3x5 
+ 5x4x(l-(l-i) 3x4 
+ 3x4x4x (l-(l-i) 5 
+ 3x5x4x tl-(l-i) 4 
+ 4x5x4x tl-(l-I) 3 
* 170.9868. 



ANB = 170.9868/(3+4+5+3x4+4x5+3x5) 
R = 170.9868/59 
= 2.8981. 

We have derived the formula for the 
expected number of buckets to be accessed 
over all possible partial match queries. 
In the next sections, we shall derive 
similar formulas for Caretesian product 
files. We hope that through these for- 
mulas, we can show the superiority of 
random files. We are still working on 
this proof. That we still can not prove 
it is probably due to the fact that the 
formula for random files is extremely 
messy. 

To test our conjecture, we conducted 
a computer simulation experiment. 

The purpose of this experiment was 
to compare the performances of Cartesian 
product files and random files. We used 
the Monte Carlo simulation method. Thirty 
sets of data were generated. Each set of 
data was characterized by two, three or 
four keys. A random number generator was 
first used to generate the number di which 
was the number of elements in the domain 
of the ith key. Then the number NR (the 
number of recordsJ was calculated according 
to the following formula; 

NR = d^djd^d^ . 

The same random number generator was used 
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to generate NB, the number of buckets, 
under the constraint that NB/NB was an 
integer . 

We then calculated the bucket size BZ 
according to the formula 

BZ = NR/NB . 

For each data set/ we calculated the 
average number of buckets accessed, over 
all possible partial match queries, if the 
Cartesian product file concept was used. 
This number is denoted as ANB CP . The 
method of obtaining this number will be 
explained in later sections. For each 
set, the corresponding ANBr was also 
calculated using the formula derived in 
this section. The result is shown in 
Table 2.1, From the experimental results 



Table 2.1 here 



obtained thus far, it can be seen that 
Cartesian product files are indeed better 
than random files. 

Section 3. The Designing of Optimal 
Cartesian Product File's " 

If a file is a Cartesian product file, 
for every bucket, records in thiB bucket 
are of the form of 



Let the domain size of D^ s be denoted as 
z^. To simplify our discussion, we shall 
assume that z{ is the same for every 
bucket. Note that this is not the case 
for the file shown in Example 2.2. In 
that case, z\**3 for Bucket 1 and zi=2 for 
Bucket 3. It is much more complicated to 
design such an optimal file. 

For a Cartesian product file, to 
minimize the average number of buckets to 
be examined oyer all possible partial 
match queries, we may simply try to 
minimize the total number of queries which 
need to examine a bucket in the file. 
(Note that this number is the same for all 
buckets in a Cartesian product file.) We 
now ask, what is the number of partial 
match queries which need to examine a 
bucket in a Cartesian product file? The 
answer is as follows. 

(1) There are zi+zo+...+z N partial match 
queries which involve exactly one 
attribute . 

(2) There are ziz 2 +ziZ3+. . .+z N _iz N partial 
match queries which involve exactly 
two attributes. 

(3) There are z 1 z 2 * . . zn-i + * • •+Z2 Z 3* • • Z N 
partial match queries which involve 
exactly N-l attributes. 

Totally, for each bucket in a Cartesian 
product file, the total number of partial 




match queries which need to examine this 
bucket is 

Z-^ + • • • +Zjj 

+ ziz 2 +. . .+z N -iz N 

+ . . . 

+ z 1 z 2 ...z N _ 1 +...+z 2 z 3 ...z N 

Let us now state formally the problem 
of designing an optimal Cartesian product 
file as follows. 

We assume that each record is 
characterized by N attributes Ai,A2#...,A N 
and the domain of Ai is Dj.. The size of 
D i i3 d i» There are totally did 2 ...dN 
records present. The number of buckets is 
NB. The bucket size is therefore (did 2 ... 
d N )/NB=C (C is an integer) . 

Our problem is to find zi, z 2 , .../ z-$ 
satisfying the following conditions! 

(1) Z}, z 2 . . . and z N are integers. 

(2) zxz 2 • . . Zjj = C . 

(3) dj/zi=mi«=an integer (This means that 
each domain Di is divided into m^ 
equal subsets, where the size of 
each subset is z^.) 

(4) z i +z 2 +...+z N 

+ z 1 z 2 +...+z N .iz N 
+ . . . 

+ z ^ • > * z jj 1^* * * *^^2^3* " " 
minimized over all possible (zi, z 2 , 
...,z N )'s satisfying (1), (2) and (3). 

Example 3.1 

Consider the case where 

dj, = 8 
do «* 4 
d 3 = 9 
and NB = 6 

In this case, the bucket size is 
C8x4x9)/6-48. There are two feasible 
solutions satisfying the first three 
conditions. The first one is? zj=8, z 2 =^ 
and Z3=a, The second one is zj=4, z 2 =4, 
and Z3=3. 

For the first solution, 

zi+z 2 +Z3+zi z 2 +ziZ3+z 2 Z3 
= 8+2+3+8x2+8x3+2x3 
=* 59. 

For the second solution, 

zj+z 2 +Z3+ZlZ 2 +ZiZ2 +z 2 z 3 
« 4+4+3+4x4+4*3+4x3 
* 51. 

We therefore conclude that the second 
solution is the optimum solution. In this 
case, 

mi=8/4»2 
to 2 o4/4-1 
m 3 =9/3=3 . 

Our Cartesian product file system 
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divides Dx into two subsets: Dn and D12/ 
T>2 into one subset, and D3 into three 
subsets: V$i t D32 and D33. The six 
buckets are arranged as follows: 



Bucket 1: 
Bucket 2: 
Bucket 3: 
Bucket 4: 
Bucket 5; 
Bucket 6: 



D ll xD 2* D 31 
D ll xD 2 xD 32 
D n xD 2 xD33 
D 12 xt >2 xD 31 
D 12 xD 2 xD 32 
D 12 xD 2 xD 33 



The reader may wonder how this 
optimum solution can be found. Since an 
integer can be factored into a finite 
number of different N-tuples, there are a 
finite number of feasible solutions, and 
we can conduct an exhaustive search. That 
is, given z^, . z N # we can calculate 

Z H" Z 2~*"* . *+ z n 
+ . . . 

+ 2^22 • • • z n-1 +# • • +Z 2 Z 3 • • * Z N 

We then choose the z^'s such that the 
above is minimized. However, we shall 
show that an exhaustive searching through 
all possible solutions can be avoided. 
Let us consider Example 3.1 again. The 
first solution of the problem is (8,2,3) . 
In this 3-tuple, there exists a pair 18,2) 
which can be transformed into (4,4) 
(8x2=4x4) without affecting the feasibility 
of the solution. However, this trans- 
formation decreases not only the value of 
Z1+Z2+Z3 but also the value of ziz 2 +ziZ3+ 
z 2 z 3 . 

For the second solution (4,4,3), 
there simply does not exist a pair Czi,Zjl 
such that (Zi,z^) can be transformed into 

CzifZj) where z^Zj-z^Zj and Zj+Zj<z^+Zj. 

Let us now consider the following 
problem: Given an N-tuple (zi,z 2 r • • -'2n) 
where z^'s are all integers and 
N 

H z.=C, can we transform it into another 
i«l x 

tit 1 
N-tuple (z^, z 2 , . --fZjj) s uc h that z^'s 

N ' 

are all integers, n z,=*C, but the value 

i=l 1 
of , , , 

zj+z 2 +. ..+z N ( 

+ z^Z2+- • - +Z N-1 Z N 



+ z 1 z 2 --- 2 N . 1 + -< 



is smaller than the value of 

2^+Z2-K . * +z N 
+ z^Z2+» • • +z n-1 z n 
+ ... 

+ 2^22 ■ • ■ Z N-1"*"" • '"*" Z 2 Z 3 ' ' 
In the following section, we shall discuss 



,z N 



this problem and its solution in detail. 
Section 4. some Theories of Minimal N- 

In the rest of this paper, whenever 
we mention an N-tuple (a 1# a 2 f . . . r a N ) , we 
shall assume that is an integer. 
Without losing generality, whenever pos- 
sible, we shall also assume that a i£ a £ + i*- 

Def in it ion: 

An N-tuple (a^, a^, . . . , a^) is called 
N 

an N-tuple of C if n a.=C. 

i=l 1 

Definition: 

A 2- tuple (ai,a2) is called a minimal 
2 -tuple if for every other 2 -tuple 
— I *V 1 » * » 

(ai,a2) where aia 2 =aia 2 f ai+a2<a]+a 2 . 

Definition : 

An N-tuple (ai,a 2 , . . . ,aN) is called 

N 

a minimal N-tuple of C, if H a,=C and for 
i=l 1 

l<i, j<N, (a ir aj) is a minimal 2- tuple. 

Example 4.1 

(2,4,9) is not a minimal 3-tuple 
because (2,9) and (4,9) are not minimal 

2- tuples. The 3-tuple (3,4,6) is a minimal 

3- tuple because each pair in this 3-tuple 
is a minimal 2- tuple. 

Definition : 

Given an N-tuple S=(ai,a2# • • • r*#), 
F(S,K) , 1<K<N, is defined as follows: 

F(S,K) « I ai 1 a i2 ...ai K 

i l <ii 2 < * ' B<i K 

for all possible 

(i^, i 2 , . « • t i^) 1 s 

For instance, 

F(S,1) = a x +a 2 +. Z.+a^ 

F(S,2) = a i a 2 +a l a 3 + * * '^N-l^* 



F(S,N-1) = a^j. . - a N _i + » • » +a 2 a 3' • 

In the following, we shall present an 
algorithm which transforms an arbitrary 
N-tuple of C into a minimal N-tuple of C. 

Algorithm A: An algorithm which transforms 
an N-tuple of C into a 
minimal N-tuple. of C. 

N 

Input: (ai>a2#...#a N ) and .n a^C. 

i»l 

Output: A minimal N-tuple of C. 
Step 1: I+N, J+N-l. 
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Step 2: A=aj*aj. Find (p,q) where tp,q) 
is a minimal 2-tuple of A. 

Step 3: If (p,q)=(aj,aj) , go to Step 6. 

Step 4: Reorder (ai , a2 , - - - ,Pr <Jr • • • . 

We obtain a new N-tuple lai,a2, 
• • * r&n) r such that aj-iia^ for 
1=2, . . « ,N. 

Step 5: Return to Step 1. 

Step 6: If J is not equal to 1, J-KT-1 and 
return to Step 2. 

Step 7: If I is not equal to 2, I+I-l, 
J+I-l and return to Step 2. 

Step 8: (a lf a 2 , . . . ,a^) is a minimal N- 

tuple of C. 

Example 4.2 

Input (34,35,105) 

1) 1=3, J=2. 

2) A=a2a3=35xl05=3475. We find 
(49,75) as the minimal 2-tuple of 
3475. 

3) Reorder. We obtain (34,49,75). 

4) Return to Step 1. 

5) 1=3, J=2. 

6) A=a 2 -a 3 =49x75«3475. We find 
(49,75) is the minimal 2-tuple of 
3475, 

7) Go to Step 6. Since J=2^1, J+l. 
Return to Step 2. 

8) A=ax'a 3 =34x75=2550. We find 
(50,51) as the minimal 2-tuple of 
2550.. 

9) Reorder. We obtain (49,50,51). 
Since. every 2-tuple in (49,50,511 
is a minimal 2-tuple, (.49,50,511 
is the output. 

Let us check into Example 4.2 again. 
The 3-tuples transformed are: 

(34, 35, 105) +(34, 49, 75) +(49,50,51) 

The F(S,1) 's corresponding to the above 
3-tuple are 174, 158 and 150 respectively. 
We note that after each step of trans- 
formation, P(S,1) is decreased. We shall 
give this kind of transformation a special 
name . 



Lemma 4.1 



Definition: 



Let S=(a lf a2, 
N N 
and n a.= n a 



r a N ) , T=(ajL,a 2 , . 
If in S, there 



^ i=l i=l 

exists an i such that a^=pq and in T, 
there exists a j such that aj=paj,aj=q 
and for all k, k?*i,j, a£=a K , T is a pq- 
trans formation of S. 



Example 4.3 

Let S=(l,2,16) and T=(2,2,8) . 



a pq- trans formation of S. 
p=2,q=8,i=3, j=l. 

Definition: 



T is 
In this case, 



Let T be a pq- trans formation of S. 
If P(T P 1) <F(S, 1) , T is a successful pq- 
trans format ion of S. 



Let S=(a 1# a 2 , 
'V- 



. .,a N ) and T=(a lf a 2 , 



Let T be a pq-transformation of S. 
If in this pq-transformation, p>l and q>aj, 
T is a successful pq-transformation of S . 

Proof : 

Since T is a pq-transformation of S, 
there exists an i ( and a j such that in S, 
a*=pq and in T f a^=q and a -j=pa j . 
Therefore, 

i ■ 

a i +a j~ a i~ a j 
= pq+aj-q-paj 

= (p-l)(q-a : j7>0 

Consequently , 

it i i i 

a l +a 2 + " * - +a i + • * 



< ai+an+. 



.+ai+. 



+aj+ 



.+a N 
+a N 



Hence T is a successful pq-transformation 
of S. Q.E.D. 

Example 4.4 

Let S=(2,6,8). Since a3=8=2x4, we 
may choose p=2 and q=4. Applying this 
pq-transformation to S by letting j be 1, 
we obtain T=(2x2,6,4)=(4, 6,4) . It is easy 
to see that this is a successful pq- 
transformation. 

Using Algorithm A and Lemma 4.1, we 
may prove the following: 



Lemma 4.2 

Let S=(ai,a2, 



,a«) where H a,=C. 
i=l 



S can be converted into a minimal N-tuple 
of C by finite number of successful pq- 
transf ormations . 

Proof : 

Note that in Algorithm A, the 
algorithm always terminates and produces 
an N-tuple in which every pair (ai,aj) is 
a minimal 2-tuple. Since, according to 
Lemma 4.1, every transform executed in 
the algorithm is a successful pq-transfor- 
mation, we have the proof. Q.E.D. 

In the following, we shall prove that 
after a successful pa-trans formation, not 
only is F(S,1) reduced (bv definition), 
P(S,2), F(S,N-1) and all simulta- 

neously reduced. Let us now first demon- 
strate this first by considering the case 
where N«4 and K»2, 

Example 4.5 

Let S=(a,b,c,d) , T* (a ,b • ,c 1 ,d) and 
T be a successful pq-transformation of S. 
In this example, we shall show that 
F(T,2)<F(S,2) . To show this, let us first 
note that 
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F(S,1)-F(T,1) 
= a+bfc4d-(afb , +c , 4d) 

F(S,2)-F(.T,2) 
= Cab+ac+adf bc+-bd-K3d) - { ab ' +ac ' +ao+b ' c ' +b f d+c 1 d) 
= ((a4o^)bf(a4^)c)-((afb , ■^d)c , +(a•+^)b , ) 
= (bta+c+dj-c^a+b'+dn+tc-b 1 ) (a+d) (1) 

Since T is a successful pq- transfor- 
mation of S, we have bcfb'c 1 . Substi- 
tuting this into CI) , we have 

F(S,2)-F(T,2) 
= b(a+d)-c' (a+c)+(c-b') (a+d) 
= (a+d) (b+c-b , -c , )>0 

We now prove the following lemma. 

Lemma 4.3 

Let S=(ai f a 2 # »/a N ) f T^a^^* . . *a N ) 
and T be a successful pa-trans formation of 
S. Then F(T,K)<F(S,K) , "for K=l , 2, . • . ,N-1. 

Proof s 



Since T is a successful pq~ transfor- 
mation of S p we have 

a 1 +a 2 +. . .+a i _ 1 +g+a i+1 +. . •^j_ 1 +paj'**j +1 + - . .4^ 
< a 1 +a 2 +...+a i _ 1 4pq+a^^ 

Therefore, 
q+pa^pq+a.. 
(q-a..) (p-l)>0 



p>l and q>a. 



or 



p<l and q<a^. . 

Consider F(S,K) -F(T,K) . 

F(.S,K)-(F(T,K) . 
N N 
~ E a. a. ...a. (a.) + I a. a. ...a. (pq) 
i^l *1 L 2 Vl 3 i^l x l x 2 ^K-l 



V 3 



V 3 



- E a' a' ...a.' (q)- I a. a ...a. (pa.) 
ij=l x l H *K-1 i^X *1 x 2 *K-1 3 

V 1 V 3 

N N , , 1 

= (a. I a. a. ...a. -q I a. a. ...a,.,) 
H x 2 *K-1 i^l n x 2 * A 

V 3 V 1 



+p(q-a.) t a. a. ...a. 

3 H *2 hc-l 



N N 
(a. £ a. a. ...a. -q £ a. a. ...a. 



'Lfl L l *2 



i-<...<i 



1 

1V 



-tpCq-a.) I a. a. ...a. 

3 ij=l x l *2 TC-l 

V 3 

N N 
(a.-q) I a. a. ..a, +p(q-a.) I a a ..a. 
3 ij=l *1 *2 *K-1 3 i,=l a l *2 Vl 



V 3 

N 



V 3 



(q-a.) (p-i) E a. a. ...a. > 0 
3 i,=l x l *2 ^-1 



"I 



Hence, F(S,K)>F(T,K). 



Q.E.D. 



Using Lemma 4.2 and Lemma 4.3, we can 
prove the following theorem. 



Theorem 4.1 
Let S» 



(a, ,a«, . . . ,a N ) and H a.=C 
i 2 i=l 1 



If 



S is not a minimal N-tuple of C, r S can be 
transformed into S , = 5 ta^,a2f • . • #a N ) such 
that S' is a minimal N-tupje of C and 
F(S',K)<F-(S,K) for K=1,2,...,N-1. 

Proof : 

According to Lemma 4.2 and Algorithm 
A, we can apply a sequence of pq- trans- 
formations to S to transform S into S 1 
such that S f is a minimal N-tuple of C. 
Assume that algorithm A takes M steps to 
finish. Let Sq=S and after the extecution 
of the m-th step, the N-tuple becomes S^. 
We now have So, Si, ...» S& where Sg«S and 
Sm=S', According to Lemma 4.3, FtSi,K) < 
F(Si-i,K) for i=l,2,.,.,M and K°l,2,..., 
N-l. In particular, F(S 1 ,K) *=F CS M ,K) < 
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F(S0,K)=F(S f K) • Thus the proof i Q.E.D, 

Theorem 4.1 states that for any given 
N-tuple S of a constant C, if S is not 
minimal, we can always transform it into 
a minimal N-tuple of C. After this 
transformation, F(S,K) is reduced for all 
K=1,2,...,N-1. 

Corollary 4.1: 

If there is only one minimal N-tuple 
N-l 

of a constant C, I F(S,K) is the smallest 
K=l 

among all possible N-tuplee of C f iff S 
is the only minimal N-tuple of C. 

Unfortunately, while there is only 
one minimal N-tuple for most cases, there 
are counter examples. We conducted a 
computerized checking through all integers 
from 1 to 1000 for N«3. We found that 
among these one thousand integers, integer 
360 has two minimal 3-tuples, namely 
Si=C6,6,10) and S2 S (5,8,9). It is 
interesting to note that this is the only 
counter example found among these one 
thousand integers. Furthermore, it should 
be noted that 

F(S 1# 2) =6*6+6x10+6x10=156 
and F(S 2 ,2)=5x8+5x9+8x9=157. 

Although F(S 1; 2)^F(S,2) r the difference 
between them is small. 

Let us conclude this section by the 
following statements: 

U) Given a constant C and an N-tuple 

S«(ai,a 2 f • . . *a N ) of C, if S is not a 
minimal N-tuple of C, S can be 
transformed into S 1 such that S 1 is 
a minimal N-tuple of C and 
N-l N-l 

I FlS\K)< I F(S,K). 
K=l K=l 

(2) For most constant C's, since there is 
only one minimal N-tuple of C, this 
minimal N-tuple S of C has the 
property that F(S',K) is minimized 
over all possible N-tuples of C. 

Section 5. The Application of N-tuple 
Theories to the Design of 
Cartesian Product File's 

In Section 3, we showed that the 
problem of designing an optimal Cartesian 
product file can be reduced to the problem 
of dividing each domain into subsets 
where each subset contains z^ elements. 
The values of zi, 22' should staisfy 
the following conditions: 

1. z i z 2 ' • • z N «=C=bucket size 

2. d^/z^*nu»an integer 



■3 . z x 2 * N 

+ V2 +Z 1 Z 3 + — +Z N-1 Z N 
+ ... 

+ z^Zj " • * Z N-1 + * * ,+Z 2 Z 3 * ' " Z N ^ ra ^ m ^ 2e ^* 

Using Theorem 4.1, we can obtain the 
following theorem. 

Theorem 5.1 

Let there be NR=dido...d N records 
where di is the size of the domain D$ of 
attribute A^. Let C be the bucket size. 
A Cartesian product file F is an optimal 
Cartesian product file if the records of 
each bucket are of the form of 



where the size of D^ s is z^ and Zi's 
satisfy the following conditions: 

(1) ZjZj.-.z^C, 

(2) dj/Zj=nK=an integer, 

(3) (z 1# z 2 , . . . , z N ) is the only minimal 
N-tuple of C. 

To obtain a set of z^.'s satisfying 
conditions (1) and (3), we may simply 
apply Algorithm A to the N-tuple {1,1,..., 
C) . If we can rearrange the resulting 
N-tuple to be (zx# z 2 , • . . , z N ) in such a way 
that dj/z£=mi«an integer for all lfifN, 
and we are further sure that (z^, Z2# . . . » 
z N ) is the only minimal N-tuple, then we 
have obtained an optimal Cartesian product 
file. Here, the following should be 
pointed out, 

(1) It is very rare, as our experimental 
results demonstrate, that there is .more 
than one minimal N-tuple for a constant 
C. 

(2) Even if S«(zi, Z2* • • . iZ&) is not the 
only minimal N-tuple, indicating that 
there might exist an S %S£ {z\, Z2i • » z N ) 
such that F(S' ,K) <F(S,K) for some K, 
F(S',KJ will not be significantly smaller 
than F(S,K). Besides, if such a K exists, 
this file is still optimal for queries 
with less than K queries specified. 

Example 5.1 

Let di=8, d2=4, d 3 =6 and C=32. 
Applying Algorithm A to Cl»l»32), we 
obtain (2,4,4) as a minimal 3-tuple. It 
is not difficult to see that this is the 
only minimal 3-tuple of 32. We rearrange 
(2,4,4) into 14,4,2). 

Then djt/zjL=8/4=2 
a 2 /z 2 =4/4=l 
and d3/z3=6/2=3. 

This means that should be divided into 
two subsets, D 2 into 1 subset and D3 into 
three subsets. The resulting Cartesian 
Product file is an optimal Cartesian 
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product file. 

If di=d2=...-dN in the resulting N- 
tuple S after Algorithm A is applied , 
zi=22-- . .=Zn* In tnis case, S is the only 
minimal N-tuple of C. If dj/zi=an integer 
for all i, the resulting Cartesian product 
file must be the optimum Cartesian product 
file for this set of records. This 
coincides with the result obtained by 
Rivest [1976] . 

Section 6. The Theories of Minimal N- 
tuple and Partial Match 
Patterns 

In the previous sections, we assumed 
that all records in Di *D2* . . .*% were 
present. This is obviously not a practical 
assumption. Unfortunately, it is difficult 
to design an optimal Cartesian product . 
file for partial match queries where some 
records are missing. In this section, we 
shall introduce the multi-key hashing 
method [Rothnie and Lozano 1974] which 
does not require the assumption that all 
records are present. We thus introduce 
the partial match pattern concept defined 
by Lin, Lee and Du [1979]. Finally, we 
show how the theories of minimal N- tuples 
can be applied to design a Cartesian 
product file which is optimal with respect 
to partial match patterns. 

The multi-key hashing method can be 
briefly defined as follows; 

(1) Choose a hashing function g^ for 
domain such that g^i D^CO, 1, . . . ,mi-l) 
where mjn^. . .iqn=NB, the total number of 
buckets required by the file. 

(2) Associate with each N-tuple 
(Li,L 2 , . . . ,Ljj) a bucket where L^ is an 
integer , 0<Li<mi- 1 . 

(3) If the attributes Ai,A2#..*,a n 
of a record R have values ri,r2, - - - 
respectively, assign R into the bucket 
associated with (gi(ri) ,gj (^2) # • . - r^N^N^ 
where r^eD^ for i-l,2,...,N. 

Let us consider the case where 
Di={a,b,c,d> and D2={e,f f g}. We can 
define the following hashing functions: 

g, (x)*=0 if x=a,b 
A =1 if x=c,d. 

g o (x)=0 if x=e,f 
* =1 if x=g. 

In this case, records will be hashed 
into their respective buckets as shown in 
Table 6.1. The reader should note that 



Table 6.1 here 



not all records are present. It should 
also be obvious that a file produced by 
the multi-key hashing method is a Cartesia n 




product file. 

If we ignore the overflow problem, 

the retrieval of the record (ri,r2# ,r N ) 

(every attribute is used.) needs examine- 
ing exactly one bucket. However, the 
partial match query with Ai=r* (for any 
ri) examines NB/mj buckets, the query with 
Ai=rj[ and A*=rj (for any r^ and rj) 
examines NB/(mi*mj) buckets, etc. 

Lin, Lee and Du [1979] defined a 
partial match .pattern to be a class of 
partial match queries. 

If the partial match queries involve 
the same set of attributes, they belong 
to the same partial match pattern. For 
instance, the partial match query Aj=ri 
and Aj=rj belongs to the partial match 
pattern iAi,Aj) . The partial match query 
Aj=si and Aj=sj belongs to the same 
partial match pattern (Ai,Aj) . 

Let us consider the case when N=3. 
The total number of buckets to be examined, 
over all possible partial match patterns 
involving exactly one attribute, is 

NB NB t KB 
m^ m 2 

m-m-+m,m.,+m, m~ 

- NB( tb m -w > 

= + m^nij + i^m^ (Tn^m 2 m 3 =NB) 

Similarly, the total number of 
buckets to be examined, over all possible 
partial match patterns involving exactly 
two attributes, is 

NB + NB + NB 
Ifl l Itt 2 TO 1 TO 3 

(m-+m~+m, ) 



m 1 m 2 m 3 
— m^ + + m^ 

The average number of buckets to be 
examined, over all possible partial match 
patterns , is 

N 

(m 1 m 2 +m 1 m 3 +ro 2 m 3 +m 1 +m 2 +m 3 ) / (N+ ( 2 ) ) 

where N+(!J) is the total number of possible 
partial match patterns. Since this is a 
constant, to minimize the average number 
of buckets to be examined, we merely have 
to minimize 

m^+m 2 +m 3 +m^m 2 +m^m 3 +TO 2 m 3 
under the constraint that m^m 2 m 3 =NB . 



165 




In general, our problem of designing 
an optimal Cartesian product file for 
partial match patterns is as follows: 
Given NB f the total number of buckets and 
N, the total number of attributes, we 
should find an N-tuple S=(rai,m2# . . . ,m$) 
satisfying the following conditions: 

(1) m 1 ,m 2 f * . • #11^ are all integers, 

N 

(2) n m.=NB. 
i=l x 

N-l 

(3) II F(S,K) is minimized over all 
K=l 

possible N-tuples satisfying (1) and (2) . 

The reader can now see that the 
theories developed in Section 5 are 
directly applicable to the partial match 
pattern problem. In fact, we can easily 
prove the following theorem. 

Theorem 6.1 

For the multi-key hashing method, if 
each record is characterized by N attri- 
butes and NB is the- total number of 
buckets required in the file, then the 
average number- of buckets examined, over 
all possible partial match patterns, is 
minimized when the hashing function 
divides each into m^ subsets and the 
N-tuple S={t&« ,m 2 , . • • is the only 
minimal N-tuple of NB. 

It should be noted that an optimal 
N-tuple S=tmi,m2/ • . . #h*n) can be obtained 
by applying Algorithm A to (1,1, • . . ,NB) , 
If S is the only minimal N-tuple of NB, we 
have obtained an optimal solution. Since 
we expect most minimal N- tuples to be 
unique, we believe that Theorem 6.1 is 
very useful for constructing optimal files 
for partial match patterns. Even if a 
minimal N-tuple is not the only one, we 
still expect it to produce a file structure 
which is very close to ah optimal one. 

Finally, let us note that if (NB) 1/N 
is an integer, there is only one minimal 
N-tuple of NB, namely, the N-tuple Cmi, 
ra2/...,m N ) where 

m 1 =m 2 = . • . =3^= (NB) 1/N 

In this case, we have got an optimal 
file for partial match patterns. This 
coincides with the result obtained by 
Lin, Lee and Du 11979] . 

We should emphasize here again that 
the multi-key hashing method does not 
require the assumption that all records 
have to be present, yet it still produces 
Cartesian product files. We can not 
guarantee that our method creates a file 
which is optimal with respect to partial 
match queries. We, however, can guarantee 
that our method is optimal with respect to 




partial match patterns. 

Section 7. Future Research 

Although we have made some progress 
in our research, we must admit that our 
results are still not practical because 
in practice, we may not be able to factor 
C, An extreme case is that C might be a 
prime number. Even if we successfully 
find z\,Z2 ... and zjj, they may not 
satisfy the condition that di/zi=an integer. 
One possible solution is that we find zj.*s 
such that 

and dj/zi>l f° r all i. 
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Bucket 2 


Bucket 3 


Bucket 4 


(a, a) 
(b,b) 
(c,c) 
(d,d) 


<a,b> 
(b,c) 
(c,d) 
<d,a) 


(a,c) 
(b,c) 
(c,a) 
(d,b) 


(a # d) 
(b,a) 
(c,b) 
(d # c) 
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U,c) 
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<c,b) 
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(c,d) 
(d,c) 
<d,d) 
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(a,*) 
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(c,*) 
Cd,*) 
(*,a) 
(*,b) 
(*,c) 
I*, a) 



lr 2, 3, 4 

1, 2, 3, 4 

If 2, 3, 4 

1, 2, 3, 4 

1, 2, 3, 4 

1. 2, 3, 4 

1, 2, 3, 4 

1. 2, 3, 4 
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1, 2 


(b,*) 


1, 2 


(c,») 


3, 4 


(d,*) 


3, 4 


(*,a) 


1, 3 


(*,b) 


1, 3 
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2, 4 


(*,d) 


2, 4 
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1.9429 
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3 
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5 
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50 


5 
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4 


1 


4 
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16 


4 
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1.5758 
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5 


4 


5 


5 
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10 


10 
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3.1646 
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6 


3 


3 


2 




18 


3 


6 


2.2414 
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7 


4 


3 
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60 


6 
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3 


5 


2 
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3 
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2.2439 


1.8293 


0.8152 


9 


5 


5 


3 




75 


5 


15 


3.1765 


2.3519 


0.7080 


10 


3 


4 


5 


5 


300 


10 


30 


4,9714 


2.6969 


0.5425 


11 


2 


3 


5 


4 


120 


6 


20 


3.4728 


2.1841 


0.6289 


12 


3 


2 


2 


3 


36 


4 


9 


2.6355 


2.0187 


0.7660 


13 


4 


2 


4 


4 


128 


8 


16 


3.9512 


2.0813 


0.5267 


14 


5 


5 


2 


2 


100 


10 


10 


3.B969 


2.7345 


0.7020 


15 


4 


3 


3 


5 


180 


10 


18 


4.6856 


2.5753 


0.5496 


16 


5 


2 


3 


3 


90 


3 


30 


2.3655 


1.7208 


0.7275 


17 


4 


3 


4 


5 


240 


10 


24 


4,8273 


2.6462 


0.5482 


18 


3 


1 


2 


5 


30 


3 


10 


1.8443 
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0.8133 


19 


4 


3 


5 


2 
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10 


12 


4.2197 


2.4686 


0.5978 


20 


2 


2 


2 


4 


32 


4 


8 


2.6078 


1.7647 


0,6767 


21 ' 


2 


3 


3 


4 


72 


3 


24 


2.3892 


2.0299 


0.8496 


22 


5 


5 


3 


3 


225 


9 


25 


4.6047 


3.0343 


0.6588 


23 


2 


3 






6 


3 


2 


2.0000 


1.8000 


0.9000 


24 


2 


4 






8 


2 
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1.6667 


1.3333 


0.8000 


25 


2 


5 






10 


5 


2 


2.5714 


2.1429 


0.8333 


26 


4 


3 






12 


3 


4 
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2.1429 
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27 


4 


4 






16 


4 


4 


3.1250 
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0.6400 


28 


5 


3 






15 


5 


3 


3.1250 


2.5000 


0.8000 


29 


5 


5 






25 


5 


5 


3.8000 


3.0000 


0.7895 


30 


4 


5 






20 


2 


10 


2.0000 


1.5556 


0.7778 



Table 2.1 

d^: the size of domain of attribute A^. 

NR: the total number of records. 

MB: the total number of buckets. 
BZ: the block size. 

ANBj.: the average number of buckets accessed per partial match query 
for random files. 

ANB : the average number of buckets accessed if a near-optimal Cartesian 
product file is used. 
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2 
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£) 
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1 


(b 
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(b 
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(a 


g) 
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2 


<b 
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(c 
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Cc 


f) 






(d 


e! 
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3 


(d 


fj 






(c 


g) 






Cd 


g) 


(1, 1) 


4 
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